PowerReplace interprets filter file line by line. A line must be as follow:
%comments-line
#define-line
first-stuff second-stuff
Comments-line has no effect. Define-line is used to set some option. The stuff-line means that the first-stuff will be replaced by the second-stuff everywhere in the text file. For the first-stuff, you can use the format:
"first-string"
$hex-string$
'pattern'
And for the second-stuff, you can use:
"second-string"
$hex-string$
In the next, first, we give some examples. Then we give a short description for each stuff. You can find at the end of this document some bad examples for understand better.
Note that a good way to learn is to study the sample filter files in the “Filter” folder.
Good Examples:
"é" "é" % OK. for HTML
"é" "é" % OK. for HTML
"à" "A" % OK. lower to upper
"è" "e" % OK. 8 bit to 7 bit
"è" "\\`e" % OK. for TeX. \\ means the character \
"//*\r" " " % OK. delete C++ comments
First-String:
- In the simple case, a first-string is a normal string without asterisk-sign(*). You must use meta character for give special characters in the string.
Example: "abc" and "ab\"ce" are simple-strings.
- In the more general case, a first-string may contain one asterisk-sign(*) in the middle of this string for any indeterminate substring.
For example, "abc*xyz" means all normal string beginning with "abc" and ending with "xyz".
An other example: "//*\n" means all characters after "//" of a line(C++ comments).
But either "*ab" or "a*b*c" isn’t double-string.
Second-String:
- In the simple case, a second-string is a normal string without asterisk-sign(*).
- In the more general case, a second-string may contain one tag \>. We’ll discuss this case later in the section “Insertion tag in the second-string.”
Character set:
We can tell PowerReplace the special characters by using meta character as:
\\ representing \
\" representing "
\* representing *
\t representing TAB
\n representing LF
\r representing CR
PowerReplace supports all character set in filter files, the control characters (0-31) in particular. We can specify characters in filters by using ASCII numbers in decimal, hexadecimal or octal base. For example, "\d065" "\x41" "\o101" (or "\101") all means the same character "A". Examples:
"\d65B" "B\101\x43" % OK. change "AB" to "BAC"
"\0" "" % OK. strip NULL
Define(#):
The default meta character used by PowerReplace is "\". You can change it. For example, to set "/" as meta character, just insert the following define-line in your filter file:
#meta "/"
If you don't use any meta character, insert the following define-line:
#meta ""
For example, to replace è by \\`e, we can use the following line:
"è" "\\`e" % OK. for TeX. \\ means the character \
or the following two lines:
#meta "/"
"è" "\`e" % OK. for TeX. now \ isn’t meta character!
I use the default meta character "\" in my documentation.
Hexadecimal string:
The hex-string must be enclosed by dollar-sign($hex-string$). You can use one of the following lines to convert "AB" to "BAC":
"AB" "BAC" % OK. change "AB" to "BAC"
$6566$ $666567$ % OK. change "AB" to "BAC"
$6566$ "BAC" % OK. change "AB" to "BAC"
"AB" $666567$ % OK. change "AB" to "BAC"
Regular expression (pattern):
This version supports Unix regular expression (pattern) for searching string. The pattern must be enclosed by single quotation marks ('pattern'). We can only use pattern for first-stuff but not for the second-stuff. Here is a small description of pattern supported by PowerReplace:
Uppercase and lowercase are always ignored. An ordinary character (not mentioned below) matches that character.
^ matches beginning of line
$ matches end of line
\ quotes character after it, whether special or not
. mathches any character
* a single character followed by * matches
zero or more occurrences of the character.
In particular, ".*" matches an arbitrary possibly empty string.
+ a single character followed by + matches
one or more occurrences of the character.
[ ] a set of characters in the set matches any single character in the set.
[c1-c2] matches any character of ascii ranging from c1 to c2.
[^set] matches any character not in set.
See also any Unix book for more information about pattern. Example:
'^ +' "" % OK. strip spaces at beginning of line.
Insertion tag in the second-string:
In the second-string, you can use a new tag \> for inserting a substring of the first-string-found (called “source”). We use two parameters (x,y) to define this substring: It takes x letters at the beginning and y letters at the ending of the source. By default, if (x,y) are missing, this substring is just the source.
Example1: My text is "hello". If the first-string is "e*o", then the source will be "ello". We study the following filter lines:
(1) "e*o" "\> and goodbye"
(2) "e*o" "\> and goodbye" (all, 0)
(3) "e*o" "\> and goodbye" (0, all)
(4) "e*o" "\> and goodbye" (all, all)
(5) "e*o" "\> and goodbye" (0, 0)
(6) "e*o" "\> and goodbye" (1, 0)
(7) "e*o" "\> and goodbye" (0, 2)
(8) "e*o" "\> and goodbye" (1, 1)
then, the output file contains the following text for each case:
(1,2,3) "hello and goodbye"
(4) "helloello and goodbye"
(5) "h and goodbye"
(6) "he and goodbye"
(7) "hlo and goodbye"
(8) "heo and goodbye"
Example2: We want to change "à" to "a'" if and only if it is the last letter of a word. The solution whitout the insertion tag is writing the following lines:
"à," "a',"
"à;" "a';"
"à:" "a':"
"à." "a'."
"à!" "a'!"
"à?" "a'?"
"à " "a' "
With the insert tag, you need only the following line:
"à[,;:\.!? ]" "a'\>" (0,1)
Or this line:
"à[^a-zA-Z]" "a'\>" (0,1)
Bad Examples:
'^ +' 'abc' % BAD. pattern at the second position
$A9876$ "" % BAD. bad hex-string, length odd
"a*b*c" "\xB3" % BAD. two * in the first-string, try "a*b\*c" "\xB3"
'[aeiou' " " % BAD. pattern syntax error
"a"a" "AA" % BAD. syntax error. try "a\"a" "AA"
"\da" "a" % BAD. decimal number error
$23" "98" % BAD. syntax error. try $23$ "98"
$A567BDG$ "98" % BAD. bad hex-string
"toto" "t*t" % BAD. bad second-string, try "toto" "t\*t"